42 research outputs found
A Frame Tracking Model for Memory-Enhanced Dialogue Systems
Recently, resources and tasks were proposed to go beyond state tracking in
dialogue systems. An example is the frame tracking task, which requires
recording multiple frames, one for each user goal set during the dialogue. This
allows a user, for instance, to compare items corresponding to different goals.
This paper proposes a model which takes as input the list of frames created so
far during the dialogue, the current user utterance as well as the dialogue
acts, slot types, and slot values associated with this utterance. The model
then outputs the frame being referenced by each triple of dialogue act, slot
type, and slot value. We show that on the recently published Frames dataset,
this model significantly outperforms a previously proposed rule-based baseline.
In addition, we propose an extensive analysis of the frame tracking task by
dividing it into sub-tasks and assessing their difficulty with respect to our
model
Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation
Automated metrics such as BLEU are widely used in the machine translation
literature. They have also been used recently in the dialogue community for
evaluating dialogue response generation. However, previous work in dialogue
response generation has shown that these metrics do not correlate strongly with
human judgment in the non task-oriented dialogue setting. Task-oriented
dialogue responses are expressed on narrower domains and exhibit lower
diversity. It is thus reasonable to think that these automated metrics would
correlate well with human judgment in the task-oriented setting where the
generation task consists of translating dialogue acts into a sentence. We
conduct an empirical study to confirm whether this is the case. Our findings
indicate that these automated metrics have stronger correlation with human
judgments in the task-oriented setting compared to what has been observed in
the non task-oriented setting. We also observe that these metrics correlate
even better for datasets which provide multiple ground truth reference
sentences. In addition, we show that some of the currently available corpora
for task-oriented language generation can be solved with simple models and
advocate for more challenging datasets
Pseudointelligence: A Unifying Framework for Language Model Evaluation
With large language models surpassing human performance on an increasing
number of benchmarks, we must take a principled approach for targeted
evaluation of model capabilities. Inspired by pseudorandomness, we propose
pseudointelligence, which captures the maxim that "(perceived) intelligence
lies in the eye of the beholder". That is, that claims of intelligence are
meaningful only when their evaluator is taken into account. Concretely, we
propose a complexity-theoretic framework of model evaluation cast as a dynamic
interaction between a model and a learned evaluator. We demonstrate that this
framework can be used to reason about two case studies in language model
evaluation, as well as analyze existing evaluation methods.Comment: EMNLP 2023 Finding
Parity-detection-based Mach-Zehnder interferometry with coherent and non-Gaussian squeezed vacuum states as inputs
We theoretically explore the advantages rendered by non-Gaussian operations
in phase estimation using a parity-detection-based Mach-Zehnder interferometer,
with one input being a coherent state and the other being a non-Gaussian
squeezed vacuum state (SVS). We consider a realistic model to perform three
different non-Gaussian operations, namely photon subtraction, photon addition,
and photon catalysis on a single-mode SVS. We start by deriving the Wigner
function of the non-Gaussian SVSs, which is then utilized to derive the
expression for the phase sensitivity. The analysis of the phase sensitivity
reveals that all three different non-Gaussian operations can enhance the phase
sensitivity under suitable choices of parameters. We also consider the
probabilistic nature of these non-Gaussian operations, the results of which
reveal the single photon addition to be the optimal operation. Further, our
analysis also enables us to identify the optimal squeezing of the SVS and the
transmissivity of the beam splitter involved in the implementation of the
non-Gaussian operations.Comment: This is the fourth article in a publication series written in
celebration of the completion of 15 years of IISER Mohal
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Recursion is a prominent feature of human language, and fundamentally
challenging for self-attention due to the lack of an explicit recursive-state
tracking mechanism. Consequently, Transformer language models poorly capture
long-tail recursive structure and exhibit sample-inefficient syntactic
generalization. This work introduces Pushdown Layers, a new self-attention
layer that models recursive state via a stack tape that tracks estimated depths
of every token in an incremental parse of the observed prefix. Transformer LMs
with Pushdown Layers are syntactic language models that autoregressively and
synchronously update this stack tape as they predict new tokens, in turn using
the stack tape to softly modulate attention over tokens -- for instance,
learning to "skip" over closed constituents. When trained on a corpus of
strings annotated with silver constituency parses, Transformers equipped with
Pushdown Layers achieve dramatically better and 3-5x more sample-efficient
syntactic generalization, while maintaining similar perplexities. Pushdown
Layers are a drop-in replacement for standard self-attention. We illustrate
this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed
WikiText-103, leading to improvements on several GLUE text classification
tasks.Comment: Accepted at EMNLP 2023 (Long Papers
Grokking of Hierarchical Structure in Vanilla Transformers
For humans, language production and comprehension is sensitive to the
hierarchical structure of sentences. In natural language processing, past work
has questioned how effectively neural sequence models like transformers capture
this hierarchical structure when generalizing to structurally novel inputs. We
show that transformer language models can learn to generalize hierarchically
after training for extremely long periods -- far beyond the point when
in-domain accuracy has saturated. We call this phenomenon \emph{structural
grokking}. On multiple datasets, structural grokking exhibits inverted U-shaped
scaling in model depth: intermediate-depth models generalize better than both
very deep and very shallow transformers. When analyzing the relationship
between model-internal properties and grokking, we find that optimal depth for
grokking can be identified using the tree-structuredness metric of
\citet{murty2023projections}. Overall, our work provides strong evidence that,
with extended training, vanilla transformers discover and use hierarchical
structure.Comment: ACL 202